Amazon Top 50 Bestselling Books

CONCEPT

Dataset on Amazon's Top 50 bestselling books from 2009 to 2019. Contains 550 books, data has been categorised into fiction and non-fiction using Goodreads.

DATA

The dataset encompasses essential columns including Book Name, Author, User Rating, Reviews, Genre, Price, and Year, spanning from 2009 to 2019. This comprehensive dataset provides a rich foundation for in-depth analysis and correlation studies. By meticulously analysing the complex interrelationships among these data columns, significant insights can be extracted.

Key Findings

1. Data Distribution :

  • Ratings generally increase over the years, indicating improved book quality.
  • User ratings skew towards higher values, with most books receiving 4 to 5 stars.
  • Number of Reviews are predominantly lower than 20000, with a long tail of books accumulating high review counts.
  • Prices vary, but most books are priced below $20.

2. Fiction and Non-Fiction

  • Non-fiction titles are more prevalent among bestsellers.
  • Non-fiction books have slightly higher median prices and wider price ranges.
  • User ratings show minimal differences between fiction and non-fiction, but fiction books exhibit wider variability.
  • Fiction books tend to accumulate more reviews, with a wider range in counts.

3. Relationships:

  • A weak positive correlation exists between user ratings and review counts.
  • There is also a weak positive correlation between user rating and price.
  • There's a moderate negative correlation between review counts and price, indicating lower-priced books tend to receive more reviews.
  • There is a moderate positive correlation observed between review counts and year of publication, suggesting that newer books tend to attract more reviews. Additionally, there is a very clear indication that the average reviews are increasing year by year from 2012 to 2019.

Conclusion

  • Overall, these findings emphasise the intricate mix of factors affecting user ratings, reviews, and prices in the book world. Elements such as book quality, genre popularity, and pricing strategies all come into play. However, it's essential to remember that correlation doesn't equal causation. While books with more reviews often have higher ratings, it doesn't necessarily mean one directly causes the other. Other factors, like the book's inherent quality or its genre's popularity, likely contribute significantly to these observed trends.

Github Repository


Data distribution

Ficton and Non-Fiction analysis

Relationships

Get In Touch!